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ABSTRACT. The arfiel# describe? a symbol ic approach to visual information processing, and 
isU Put tour principle? teal appear to govern (he design Of compFex symbolic infer motion 
processing system*, A cbm pula Lionel theory Of early visual information processing rs 
presented,, which extends to about Ihe level of Figure-ground separation. It Include? a 
process-oriented theory of texture. vision. Most of the theory has been Implemented, and 
examples are shewn of th-e analysis of several natural images- This replaces Memos 3£4 and 
334 . 
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i'umrnrarv 


1, Am introduction is given Jo a theory of early visual information processing. The theory he? 
been implemented,'and examples ire given of image* *t various stages Of analysis. 

II (i argued that (he Mrs! step of consequence is to compute a primitlva hut rich dfrscrijjltpri 
qf the gray-level change® present in am image. The description is axpressed in a vocabulary, 
of hinds of intensity change {EDGE, 5HA0[NG-EDGE, EXTENDED-EDGE, LINE. DLOQ bEc.J, 
Modifying parameters are bound to the elements in the description, specifying their POSITION 
ORIENTATION TERMINATION points, CONTRAST, SIZE ard FUZZINESS, This description Is 
obtained from the intensity array by (Sired techniques, and it is called the primal xkr.tr h. 

3, For most imagesj the primal sketch is large and unwieldy. The second important step ih 
Visual information processing is tp group its content* In a way that is appropriate for later 
recognition. 

4 l From Our ability to interpret drawings with til lie semantic content, one may infer [he 
presence in our perceptual equipment Of symbolic processes that can define ’’place-tokens" in 
an Imago frt various ways, and tan group them according to certain rules. Homomorphic 
technique* fail to account lor many oF these grouping phenomena, whose explanations require 
mechanisms of construction rather then mechanisms oF detection. 


&. The necessary grouping of elements in thd primal Sketch may be achieved by a mechanism' 
that has available the processes inferred from {A\ together with the ability to select items by 
first-prefer discriminations acting oh the elements* parameters. Only occasionally do these, 
mechanisms use downward-ftewipfc Information about the contents of the particular image 
bolng processed. 

6. !l is argued that "non-attcntive" vision is in practice implemented by these grouping 
Operations and Mrst-ordar discriminations acting On the primal sketch, The das* of 
computation* SO obtained differs slightly from the class cF second-order operations on 5hn 
intensity array, 

7, The extraction of a form from the prim’ll sketch Using these techniques amounts to She 
separation of tigure Trom ground. II is concluded [hat most ol the separation can,be carried 
out using techniques Sha! do no! depend upon the particular image In question. Therefore, 
figure-ground separation can normally ptettie the description of the chape of the extracted 
form. 


&, Up to this point, higher-level knowledge and purpose are brought to hear on only a few of 
thp decisions taken during ttw processing. This relegates lhe widespread use of downwar[f- 
flowing informatfOn to a fater stage lhan is found in current machine-vision programs, and 


Invpli&s- 1 h&ud> knowledge shgulrf influence the- control of, Mhor then interfering with, Ihe 
actual data-prOce&-sing that ii tokins place lower do-wn. 


Ilrrri7d![irJ ion ’ 

The vision problem begins with a large gray-level intensity array, end 
culminates in a description (hat depends on that array, and On the purpose for which it is 
being viewed. The question ol interest Is what hw to go on in between. This article outlines 
the first part of 3 theory Of visual information processing, and covers the analysis up to about 
the level of Figure-ground separation. The theory is restricted to single frame, 
mOrOchrOmatiEj monocular images without specularrfcita, reflections; translucency, transparency 
or tight sO-ur-ces. It is argued that the first stop of consequence is to compute a primitive but 
rich description Of the gray-level change* present in an Image, and that all subsequent 
computations are implemented as manipulaliOrtS of lhat description. The description itself Is 
Called the FnfrtW SfcmcJi, The processes tha) compute it, and most Of the processes that 
Operale directly on it, do not depend significantly upon the particular contents of the image. 
The control of these processes may. 

The approach taken here rests upon the observation that a drawing of a scene 
adequately represents the scene, de-spile the very different gray-level image to which it gives 
rise. II therefore seems reasonable to suppose that fhe artist's local symbols are in 
correspondence with natural Symbols, that are computed Out of (ha Image during the hOrmaf 
course of its Interpretation. The idee (hat visual processing should commence with the 
extraction of a more Or less elaborate line-drawing it not a new One, but «1s successful 
implementation has proved elusive, Several edge-detection algorithms have been proposed 
(Hueckfll 0971 ft 1973), MeCItod (I970\ floscnfeld & Thurston (1971), Ttosehfeld, Thurston & 
Lee Horn (1973)), but at their proliferjtiori suggests, tho resulte of applying them to 

natural images have proved generally unsatisfactory. This has Fed some to believe that an 
adequate line-drawing Of a scene cannot be computed unless hypotheses about what is 
present are alrowed to influence quite party stages in- (he processing (Stvrai 1973, freuder 
1975), 

How much independent pre-processing chp usefully be carried out? Do Ihe 
different si ages In recognition have I® interact in a rich and complex way. Or may they be 
implemented in modules that are to a first approximation Independent? These questions tf® 
not depend upon the particular hardware (wet or dry) ip which the processing is implemented. 
We need to answer them before we can address "higher-levei" problems, because (he nature 
of Ihe answers determines the Overa’I strategy that subsequent processes must employ. 

Gfnrrrril principle 

Several lessons have been (earnt over the last ten years From the experience 
Of designing and implemenling large symbolic computer programs. These lessons may he 
expressed as four principles for the Organization of complex symbolic processes. Because ! 
shall need to refer to them, and because recognition and other advanced biological 
computations are complex symbolic processes, ( te^e the liberty of setting, pul these 
principles here. 

It Frincipte of rtspficii itnmin^ 

Whenever a cOHechOn of date is to be described, discussed or manipulated as a 


whole, it should first be given a name. This forma the dal a into an entity In it* own rlghh 
permits properties Ed be assigned td it, and allows other structures and precise* to refer to 
it. The act pF naming i* the distinguishing mark of symbolic computation, and (hit Insight was 
the single mosl important idea behind Iba invention of the programming language called LISP 
(McCarthy <tt nL 1963). 

vf mtiJuJor dViijrn 

Any tsrgHf compulation should be split up and Implemented as a collection pi 
small sub-parls. that are as nearly independent ol one another as the overall task, allows, If a 
process Is tio( designed i<n this way, a small change in one place will have consequences in 
many other places. This means I'hal 1 he process as a whole- becomes extremely difficult to 
debug or to Improve, whether by a human designer or in the course of natural evolution, 
because a small change Id improve One pari has to be accompanied by many simultaneous 
compensating changes efEewhwre. 

3: Principle o/Jan^f wmmiimr.nt 

The principle oF least cOmmilment sUtes (bat one should newer do something 
that may later have to be undone., and T believe that it applies to all situations in which 
performance is fluent. St is frequently the cats during (he e^ecytlonj of a recognition (ash lbal 
there are a number oF possible interpretations of a particular datum, but that there l>a not yet 
sufficient evidence, to decide between them. In such cases, One should never become 
committed to one uF the possibilities promalureFjr, because of the damage that Knowledge 
Associated With that possibility and not wilh the others can subsequently cfo. 

There are (wo escapes from situations in which the principle !s about to 'b& 
violated. One is to 'waiil and ecp f , hopeful that the rival possibilities tan be maintained: 
without causing memory over (low until information becomes available that ten select the 
correct interpretation. Marcus [1974) has conjectured (hat (he structure or English syntax l fi 
such that a wait-and-see parser never has to wait very long before seeing. The other escape 
is to restructure Ihe problem, by breaking the computation Mo more steps, by increasing (Fie 
vocabulary for impressing *he possible choices, and by idling more diagnostics For detiding 
between rival possibilities. The sheer volume pi information rules Out a wait-and-see 
approach (0 early visual processing, so only the second alternative is a real option (here. My 
CXparidnce has heen lhal if One has To disobey the principle ol load commitment, dne is 
etth-oc doing something wrong, or something very dilFiculL 

An application e( the principle is Frequently accompanied by a particular stylo 
qF pomput ation called Mmframf finnlyjM, or /iliterirts. We shall meet it later in this article. 
Where several possibilities compel0 For lhe privilege of describing a particular datum, there 
Usually evfsf constraints Or measure* of preference that Operate aarong 1hem. The act OF 
filtering the possibilities using the constraint*-is a distinctive style of compulation, somewhat 
reminiscent &f relaxation techniques tor solving cample k problems In structural engineering. 
Constraint analysis was lirsl used elfErclively In a vision program by Wat|j (1972). A neural 
Implement al ion oF essentially thi* technique was given by M»tr (1571 secliOn 3.1.2), 

4; Principle Crf graceful dvgrndaltOfi 

The Final principle is designed lb unsure that wherever possible, degrading She 


data w||| rot prevent one from delivering at least same of the answer. It amounts lo a 
condition On the continuity OF the relation between description* computed a t different stages 
In the processing, For example, it would bp foolish not to require that a "rough" two- 
dimensional description, of Ihe Kind that a vision syilem might compute Out of a drawing, 
shcHJitf enable it to compute * 'rough" three-dimensional description oF what the drawing 
represents. ■ . 

£ar(j Froc-ir^jinjf: tempniin^f eke pnminf lAeCeft 

The primal sketch consists ot a primitive dirt rkh description of the intensity 
changes that are present in an image. This description consists of a cot of iS-te-rlrOns* 
expressed in terms OF a vocabulary of symbol and modifiers that are powerful enough to 
capture all of the important information Ip an intensity array. An example of such an 
assertion might be 

{SHAOhMG-EOGE (Pt>3]T]QM W 4S) (73 4B» 

(CONTRAST 

(FUZZItCSS 17} 

(ORIENTATION 0}} 

The design of a method for achieving this ft*tf on two primary decisions; what 
types of intensity change are to he detected, and how expressive i* the VOcebutary in terms 
of which those changes are to be described? 

0n(i-d , FJ7irFT«ajTirl intemily pre/fJcr 

In an Empirical- sludy, Herskouit? ft ainfprd (1970 pp!9, 53, 55} found (hat the 
most common Intensity changes in images oF scenes composed of polyhedral objoety wore step 
changes, bumps, and roof-shaped profiles. Our experience adds some others for more general 
scenes (sea figure 2). The detection of roof-shaped intensity changes requires a sensitivity 
to changes In intensity gradient. The human visual system has long been known to be 
sunsiliuo So uueb changes. {Mach Bards, RalliFf J-965}, but flf tho edge-cMoct ion- algorithms 
refer reef to fn the introduction, Only the BinFbrd-Horn line-fmdaf (Horn 1973) Incorporates a 
sensitivity to the Second derivative of intensity. There is no evidence that humans are 
sensitive to higher derivatives, 

A competenl edge-finder therefore reeds to be sensitive to discontinuities in 
intensity and in intensity gradient, at (roughly) to measure the Mrs! and second derivatives, of 
intensity Everywhere. Approximations to these quantiti-e* may conveniently be obtained by 
convolving the image locally with "edge-shaped" and "bar-shaped"* masks (*M figure £a), This 
follows from Ihe fact that an edgs-shaped mask measure* an approximation to the I'OcaF 
intensity gradient in a particular direction. A bar-mask may be thought of es composed Of 
I wo adjacent edge-masks with opposite signs. It there Fa re measures spprpsrlirnatyly the local 
change In Inlens-ity gradient. 

TFiis argument define* Ihe type* gf intensity change that are lo be detected, 
but it rs important lo mole (hat simply making tlie measurements is not enough, Almost ev^ry 
point in elmOst every natural image give* rise to a non-zero convolution valuO with almost 


ffGUEE I. ^electing the appropriate ro-ask-siza from which to cnm-puta the description of an 
Intensity change, Tha figure illustrates She convolution ol "edge-shaped' masks of three sizes 
wi tin different intensity distributions. The masks are shewn to stain on the lei I,. and I ha 
widths of their panels Hre EQ, 25, and 60 tpiils. Tha three intensity distributions ar* a step 
function £*}, s function that increases linearly over tOO units and is consEanl elsewhere (bl, 
and 3 step change 10 units wide superimposed an the linear &na (c). Conuolulions with oath 
of these distributions are orbited opposite each mas*. In 4d)^ the peak height that occurs in 
each' convolution hi* been plotted against mask size for each intensity distribution; trace 1 
corresponds to distribution (aj h trece 2 to distribution {b>, and trite 3 to distribution ft), The 
selection criterion chooses a mask size if it correspond* to a peak or To the left-hand end ol ® 
near plateau In the graph (d), So™* dlttfIbutiena cause two rresk size* to be selected. 
Distribution (.c) is One of these. The mask 5 i?es selected for it are 10, and a value near 90. 
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FIGURE 2. CFassilying I he- gray-level! intensity Change* present In in Fmage, Examples pf the 
edge- and har-mashs that ware used appear in (a). The lex! ■classHla* (be possible 
■CprsfEgura lions of pea* patterns in edge- and' bar-mast pprryeM ion profiles, and ItiFs 
classIfFcalion k illustrated by (b) - (f], Edge-mask proFiles are marked with a 1,. and bar^inask, 
profiles {second derivative* with l 2. The classes are EDGE (b*. EXTENDED-EDGE {c*, BAR 
(Math Li end!' (d) f LINE fall Pnd SHADING-EDGE (.1), Inter me diita lernss are used when the 
proto tsflr fail-s Id find sufficient peaks to determine the edge typo- 
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FIGURE 3. The Intensity distribution exhibited in (a), wtose profits appears Mb), was obtained 
by Illuminating a curved piece nf white paper fr&rn one end, and VniWlnj it from above. Its 
description, computed using an edge-m-as* fll panel-width 8 (e), and bar-masks pi panel-widths 
4 (d) and 8 (e) t is as fnllowst 
EDGE (POSITION BO) {CONTRAST 13SHFUZZ SHARP) 

EDGE (POSITION 212} (CONTRAST 3) (FUZZ 4} 

EDGE (POSITION 232} (CONTRAST 2) (FUZZ SHARP} 

EDGE (POSITION 435) (CONTRAST -3) (FUZZ 4) 

EDGE (POSITION 444) (CONTRAST 25) (FUZZ 5) 

EDGE (POSITION 464) (CONTRAST 2 ) (FUZZ 4) 

EDGE (POSITION 490) (CONTRAST 1} (FUZZ 4) 

EXTENDED-EDGE (POSITION 5B2) (CONTRAST -12) (FUZZ 9} 

(Ihe peaks giving rise to this edge are marked with arrows} 

EDGE (POSITION 624) (CONTRAST -20) (FUZZ 6) 

EDGF (POSITION 676} (CONTRAST 3) (FUZZ 4) 

EDGE (POSITION 6B4) (CONTRAST ’4} (FUZZ 4) 

SHADING-EDGE (POSITION 570) (CONTRAST -14) (WIDTH 67) 

SHADING-EDGE (POSITION 391) (CONTRAST fl> {WIDTH 36) 

SHADING-EDGE (POSITION 339) (CONTRAST -B) (,W[DTW 73) 
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every size ar-d <jri*n.t-alic-n of edge-mask, We |hErof0- r e have tp compute from thi-s mass, of 
data some symbol lhat represents a local pio;p el edge, and it is [hie symbol that will than 
itaM in torffispOndence With a line segment in an artist’s drawing. Fortunatelyj wo OI/l maks 
$ areal simplification at this stage in the analytit. Provided that measurements tr* made with 
masks OF two ar more sizes, the position! and sices of the peaks -in the measurernnents provide 
enough information to compute ‘he description of the underlying intensity chjngrhs, 
Furthermore) provided that a group if peaks is sufficiently isolated from other peaks, the 
other peaks m'?y De ignored when analyzing that group. 

~he reason for this illustrated in figure 1 , which shows the difference 
between ed 3 .e-ma. 5 k values obtained using masks Of three different sizes on a Slop change in 
intensity (la), and on a gradual change (ib). The results are analogous to the power spoclra 
of the different kinds of edge. £lep changes are 'soon” equally weii by all sizes of mask. 
Gradual changes are seen increasingly Faintly by edje-shaped masks whose dimensions aro 
smij;|er than the d.stance Over which Ihe intensity change is faking place. Figure Id shows 
this e'fect in graph c form by plotling the maximum (absolutes edge-mask value against the 
mash width. I race 1 arises from the step change {figure la), and (race 2 arises IrOm Ihe 
gradual intensity change (figure |h> r A good estimate ol the spatial extent ("fuzziness") of an 
edge may be made by finding the mask size at which the edge "mask response stir (5 to 
diminish. Accordingly the- fo lowing criterion L$ used. 

crrtcrieii: mask size s i$ selected al point P in the image whenever (a) masks 
slight./ smaller than 4 give an appreciably smaller peak at P t and (b) slightly larger maths give 
a peak that i? net appreciably larger. 

For some intensify disSributions, more than one m»tk steo will satisfy the 
selection criterion. For the disTriUuticr, shown in figure le., Ihe criterion 1$ sjt sfieo by s “ 10 
en4 $ =» S5- 1e 100 (depending an "hr algorithm lhal interprets ’’appreciably' 1 }, as can be seen 
from IraCc 3 OF Figure Id. Such a dSstrrbiftlon would give rise to three assertions, 0 sharp 
negative edge dose to a sharp positive one,, and a fuzzy positive edge lhat encompasses the 
OihEr two. 

Tnis shows □'.& way m which the use Of multiple mask sizes i$, Important, but 
there is another reason which is nearly as important, jt is that where a faint edge exists in 
the image, it is frequently impossible to iell from a single record which of Ihe peaks are 
important, and which are due to noise, Matching peaks oblairved using different sizes of mask 
greatly aids the separation of signal irom noise. 

The algorithm S 0 which Shis leads is similar to the pipn-lpnear technique 
described by Rosei-dcld & Thurston 119713. The difference lies in Ihe use to which the 
algorithm is put. ftesenteitl £■ Thurston used it idr datecling lexlure boundaries at which the 
average gray-level change was sma-l compared with She contrast occurring within oath 
texture. "o achieve successful rebuts, they required lhal measurements from masks of all 
sizes bo ava (able at a I pe nis in ihe image {Note thal unlike spatial frequency,, the denser 
the measuremo.nls, ihe more ntdrm a I ion one has. tf measurements arc made at every point 
and sufficient information is available sboul She boundaries, * finite intensity array Is 
completely recove'able from its convolution with any edge- or bar-shaped mask lhal is nol 


too large), ]n the presgnl theory, texture boundarits are detected by fllhtr ihmihi, *nd !he 
algorithm Is used simply to obi Silt a measure of the spatial extent of an intensity change. 
Hence urliKe Rpstnfsld & Thurste»Ts Application, tho distance between measurements tan 
decrease as Tht siio pt the miSk Increases without weakening the technique. 

The process ol eomputiicg (ho description consist thao of four operations^ (t) 
find and match peaks In the measurements obtained front the convolutions of the image with ■ 
different sizes of rraski (21 setect the relevant peaks Losing the selection criterion; (3) 
separate the peaks into isolated groups: and [4; parse the local configuration oF peaks into a 
descriptive element. A smalt number or classes or peak configuration suffices to cover the? 
CASOs that tan actually Occur, and (hey are illustrated in bgcrr-n 2* The ligure shotfi fypiea! 
combinations of peak patterns that Oecljt in the Outputs Iron edge-mask (upper records? and 
from bar-mask (lower records) convolutions. Examples of the masks thal we use appear In 
figure 2a. The descriptor EDGE is used when two peaks of about equal and opposite sighs 
occur together in the bar-vnask record (2b). tf one bar-mask peak is considerably smaller 
than the other,, the edge is classified as an EXTEND ED-EDGE (2cX Extended-*dgEi are common 
whsie- a convex boundary is illuminated from one tide- Figure 2d shows an intensity gradient 
edge, And figure Jc corresponds to I ha pretence of a thin LJN'E such as can Occur in tho 
Highlight from an object's edge, or a very thtn pencil stroke. Finally there are edges that 
bofin And end gradually, and extend OVcF A relatively large distance; these ere classified as 
SKAQEMG-EDGEs (figure 2fX In addition to descriptor* oF edge lype, one can measure an 
edge's CONTRAST, POSITION, ORIENTATION, and FUZZINESS. This test parameter charActeriEes 
Ihe spatial extent oF the edge. 

figure 3 gives an example of an Intensity distribution that has been described 
by this process, and Ihe legend explains which mask -convolutions were used. One of (he 
assertions hat been traced back lo tFie convolution profiles, and the arrows pO-inl I* thfl peaks 
that gave rise to thal particular assertic-n. The low-lgvel vocabulary that Is used In our 
present system is not intended to be definitive, but some claim if mado lo the effect that 11 is 
a good eirample of the genre, because it reals Cm Ihe correct me a? jrememts, it has sufficient 
expressive power to describe most kinds of shading adequately, and the method is simple and 
works reasonably well, 

^xtetttiiHi Eft rue dimeniianj 

The method may be extended to two -dimensions by carrying out the analysis 
Simultaneously at several different cmentation*. It is preFerahte (o use orientation-dependent 
measures for making the initial measurements, for reasons that are illustrated by figure 5. The 
image (5a) Of a chair (12£ points square), whose hall-tone image is figure 4a and whose 
Intensity distribution is shown in figure 5b, has boon convolved with "corner-shaped™ masks. 
The results appear in figures 5c $i d, bul tan the leader confidently distinguish the comers 
Irdm these meAsuremenlt? The reason For [he failure is that the inverse transform to that 
produced by a corner-shaped mask depends critically On the boundary conditions thal obtain. 
Any method that computes a corner auerrion It saying something about this inverse and so 



FIGURE 4, Thit iiguropro-vidBs a high quality reproduction of the tin intent tfitCUUvd in the 
text, a and b ware Taken with f considerably modified information International Incorporated 
VidtssectOr,, and the rest were taken with a Telemjiliori TMC-2100 UidicOn camera attached to 
a Spatial Data Systems "digitizer {Camera Eye 108), The tull dynamic range from biack, to 
whrto Is represented by 256 gray revet*- The Images reproduced here -were created by an 
Optronics PI500 Photowriter front intensity array* (hat measured 12S elements square. This 
jise of intensity array corresponds to viewing a J inch square at 5 feel with (be human 
retina. The Image pf the period at the end of this sentence probably covers mare than 40 
retinal receptors. The reader thou'd view the images from a distance or about five feet when 
assessing the performance of the programs. In the Interests of clarity, these intensity arrays 
have been displayed In two other ways (where 1 helpful}. They have been printed on a 
Xerographic printer using a font of 16 gray levels; and they have been displayed as a three- 
dimensional B fa phj ip which the x coordinate represents intensity. These dispFays appear in 
the figures. 
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FIGURE S. The image ot the chair whose ‘half-tone representation Is given in figure- 4a, has 
been printed in a to gray-levet font in (a). A Three-dimensional intensely map ■(height *- Ecg 
intensity} appears h (bl, This image has beer, co/wotved with two "edr ntr-m«ks H (c} and. (d). 
Detecting corners dr pm such measure nrents alone ii not an easy task, This illustrates why it i$ 
difficult SO compute a description of ah image directly Fr&m measurements, that are not 
directionally selective, 
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EARLY VISUAL PROCESSING 


FIGURE 6. The first step in computing the primal Sketch of 1 he image CHAIR Is to compute a 
■description ot the grey-level changes *f each of eighT Orientations. The results Of doing this 
at (our orientations are shown hero, The Orientations are arranged clockwise from the 
vertical (a}, 22.5 degrees to the vertical (b), horironla.1 (c), and £5,0 degrees to the horizontal. 
The descriptions were obtained by scanning every other line perpendicular to the Orientation 
pf the masks, Each- division on the awes represents ten. image elements. Two sites of baf’- 
majk and on*e edge-m&Ek were used. Th^s included a bar-mask of panel-wid1h 2 anti length 
10 , addition to the masks shown in figure ?a. Each of the letters in each figure represents 
an assertion like that given in. the legend to figure 3, Tho awes era marked at multiples of 10 
picture element*. 






























must take enough information into account at each point tosallsry the dependence on 
bounder/ conditions. This extra Informal ion may be supplied by looking at the results of the 
corner mesh at neighboring points or by looking a! Ihe results of *ompi other measure-mend 
leken in perallelj the important point it Ihft (he compulation is not a trivial one and it ha-s lo 
take these extra lectors Into account.. 

The way to avoid the difficulty is lo make the masks so Orientallon-ctependenl 
that they push the problem beck into uine--dsnien-sjon. To lake account ol the boundary 
CCndibcns associated with odge- Or bar-shaped masks, One needs to compare tfuanlilies in 
only two directions, father than in all directions round a point. This makes It inherently 
simpler to compute the pr<mal Sketch trdm measures obtained with such masks, and it is why 
WO Use them, tehee that this argument is Independent of presumed properties of tlht image. 
It Is hot impossible to compute the primaf sketch from measures that aro not directionally 
selective, but a persuasive case would have (0 be made for choosing ther^ 

ComtiiniiTjr orrffiiioJ.Eon-iispffliienl daacripiliejtt 

The number of different orientations at which Ihe analysis needs to be carried 
Out is hired by Ihe first stage at which lt >0 rocs! assertions are glued together. The sensitivity 
of (he moths is not so important, an we can see by calculating their Orientation tuning curves, 
The ratio Of panel-tang I h to panel-width in the masks that we use is about 5:1 {Figure 2a) r |f 
such a mask is rotated about a step-change edge,. Ihe angular distance between the maximum 
response and i f\Z of the maximum is about 35 decrees). jo their natural tuning curves are 
Very brOad. 

Much more critical is 1 ho lluxrbility with which individual elements are combi,nod 
tOi form assertions about small cdga-segmenls. This process is the beginning of the grouping 
phenomena Ehat seem to be central to early visual processing, amd designing il has been |h<r 
main stumbling block in writing competent edge-detectors. Cfew of tha'best of them {Horn 
1973) requires that lines should have length JO before evidence of their existence is accepted 
*s compelling. It was designed this way because If substantially shorter elements are 
accepted, a targe amount of "noise" appears 1^ Ihe outpul- Blobs and blotches, cO-mmOn in 
textured Images, often give rise to elements that are shorter than this, *q ways have to be 
teund of dealing with I hr wisa, 

Figure E gives some pis ample? of (hp data with which One has to deal. This 
ihows the primary analysis OF CHAIR at the vertical (6a), 22.5 degrees to the vertical (6b), 
horizontal (6c) and 45.0 degrees tp the horizontal {6d). Tor each tnask. Orientation,, the image 
has been scanned a ! ong every other line perpendicular to the mask, and every point along 
each scan line was considered, We have to Use a line scan because the smallest masks used 
were so tiny. Each- symbol E in hgure 6 represents an assertion liko that given in the legend 
to figure 3. Wilb this scan, it is sufficient to use a primary grouping that Operates 
independently along eight orientations 22.5 degrees apart. The grouping requires that the 
types of adjacent primary assertions (represented by the Fl) should roughly match, (fpr 
example EDGE! matches EXTENGEO-EIKE but not LENE), and that the relative position* of (hi? 
two assertions should be appropriate. Edges whose oriental ions lie midway between two. 


scannihg directions are sometimes found by both neighboring ccans, which shows that elfiht 
orientations are sufficient al this stags. $om* technic^ problems have la be dealt with 
before this process will work successfully, but they are toe minflr to be treated hwre (see 
Marr 1976]. 

By the time the primitive elements have been assembled inlb straight edga- 
&egments T evidence that I hey originated from eight scans has almost evaporated. It Is 
advisable not to quantize (be orientations of the glued edge‘segments, bet-aus* doing so can 
cause confusion between a straight line and one coni awning many small kinks. It is however 
possible to devise a discrete representation syslen for the segments. In which a Segment Oil a 
given orientation is represented by linear interpolation between fixed, standard Orient aliens. 
Most SChomis of this sort require SQma mutual "inhibition* between carriers of neighboring 
components in Order that the contrast Of the intermediate edge should be represented Inesrly 
(see Macr 1976), Such inhibition arises for purely representational reasons. The main force 
behind the initial gluing process is the consistency relations between nearby primitive 
tlo-menls. 

Nevertheless there turns out to be a need for competition between scans at 
dsHorent OrientaEions, that arises for reasons which are Intrinsic to the analysis nol just from 
a representational convenience. The surprising part is that the competition is required not 
between segments at nearly adjacent Orientations, but between ones that arc nearly 
perpendicular. 

Figure 7 illustrates the problem* that arise. The image of a rod (figure flb, 
figures 7a and 7b) was first operated on at eight orientations wilh the process described in 
the last section. Next, those local assertions have beeh glued along directions nearly parallel 
to the masks from which they wore Obtained. Each edge-segment in fibers 7c & d 
represents several of the E's of the typ* shown in figure b, end the database records all of 
the para motors associated with each segment. Quantities like the edge type, contrast and 
fwzzinoss are specified at intervals along (he longer segments, since they Can change along 
them. The longer segments should properly be regarded as a sequence of cotfinear short 
segments. In a full vision system, dSwontl«j|t|fr* p[ binocular disparity or motion along such 
an edge could still prevent the assembly of its subsegments Info a single unit. 

The feature of the data that is relevant to inter-orientalioh competition is the 
abundance d! short segment* roughly perpendicular to the primary edge (figure 7cl. These 
are caused by a combination ol lo£al fiDfei, the image tesselalion, and irregularities in the 
image. They occur irt every image that we have processed ]n dealing with them, One cannot 
■dismiss In a cavalier manner all very short segments: tiny "blobs" In the imago also give rise 
to them, «s can bo seen from 1 he sanre image at coordinate (73, 75}. But a "small" element 
like this -can be ignored If (&) if Cresses a "long" elemenl, end (b) its contrast Is less than that 
or the item if crosses. Figure 7d shows 1 hs results Of removing small noise elements using; 
this criterion. Occasionally, two small noisy segments can accidentally become aligned, 
creating a longer noisy segment. These are eliminated In |ho tame way. 

The crosses in (hs figure (sometimes related to avoid alignment with Ihe- edge 
segment to which they are attached) signify Ihgt the contrast ut a direclod segment changes 


FIGURE 7, The second step in computing the primal sketch. After The inf entity changes haver 
been described independently at each of fl orientations, a«*d afler [noil line# assembly pf 
these descriptions has taken place, the eight descriptions ere combined, This process is 
i Instated here for a particularly simple image, of the rod whose half-tone representation 
appears es figure 4b. The printed version of this Image appears as fa), and the intensity map 
as Vs}. The results are combined to give the data shown in (c). Each tiny line segment 
corresponds 10 two pr more individual assertions Hike those illustrated in figure 61, and a 
summary qi the information associated with cacti of those assertions (as in figure 31 is made 
at intervals along each sesnWflL Only information about Ihe positions of the segments end 
about the precursors of termination assertions (shown as Crosses) can CEmvanioplly be 
represented in a diagram; this can give a misleading impression of some items In the primal 
sketch. For example, many of the iines Oh the curved paFf pf the rod on the left bl the image 
erise from shading ed^.es. Thsy describe tho gradual intensity changes that take place there, 
and should not be thought of in thB same way as sharper edges. Short nbme elimination then 
takes ulace to give (d), which gives a fair Idea of Ihe messiness of the Uninlerpreted primal 
sketch, 
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FIGURE S. The dif fortnte between the primal sketch and a feature-paint array i= brO 1 . 13 .ht out 
by an intensity distribution like (al. A measurement taken with a Urge mask fb} to old 
generate a feature -po nt, but it wbulc- not be used in the compulation of the prismaS sltet<k 
This is because the sharp contrast changes Operalc through Ihe selection criterion to force 
the description 10 be computed from smalt masks lika those shown in {cj, The fine! description 
is oF two blpb5j v-'h.ich define "places” (dt. 
















rapidly at lh*t point, possibly becoming zero. They are the- precursors of assertions about 
tbs presence of lermlnjbons, ar»d may be thought ol as Identifying lh* evatt position pi a 
termination iF rne exists near there. The problems (bat arise in obtaining them are dealt with 
elsewhere (Marr 1976). 

On* other Item of nple Sn computing (he printal skeleh is tho question ol 
delecting locel, smalF blebs, Figure 7d at coordinate (73 r 75} shows how they appear, and in 
fact we mak# small blobs a primitive element of the primal sketch, together with their 
associated contrast, and the sizes end orientations pi their major and minor ereS. The 
defining criterion tor when an image item is small enough Eo be called -a blob or * line it that 
il should be indivisible. This occurs when One of its dimensions is comparable with the 
resolution of I he analysis of the image at that point {about 5 image elements in length}. 
Finding blobs from the glued assertions depends a sm^l antounl on elegant programming, end 
ft large amount on brute force {Marr 1976}, 

Some COPUrturner* 

As a model of the inform stinn-proresslrig. thal Is performed In area 17 bl Ihe 
monkey, these ideas have ore mam consequence whose disproof would destroy the theory. It 
is that Ihe dkect output of a linear simple cell is not availabfa centrally. Its signal is used Eo 
create an assertion about the presence ot an edge, and lha! assertion is whet is available. 
Creatirg I he assertion 1$ an. act of computation - a simple One. since it involves liltte more 
then peak matching, applying the selection, criterion, and the classification ol L i peek 
configuration! but il is an ael of computation nonetheless. The main point is thal this has |o 
go on, and one should therefore be able to find typerimenlai evidence of tt, 

A consequence of this View is illustrated in figure 8, Suppose thal an image 
contains two small dose blobs. These blobs give rise to measurements by a number of sizes 
of mask - some small ones repretented by the tiny Irnr stE^cnl-s, and sqm* largR ones,'tiki? 
the One that is illustrated One's a priori mcJinsticn night be that a large "line detector" 
would tire, end that this would have something lo do wifi seeing Ihe two blobs. Thii viEW 
amounts to supposing thal simple cells write directly ini0 a feature-point array that is. Ireely 
available to subsequent processes. But if our theory Is car ret!, although the Forge h si mpto 
ceil" mey indeed fire, its measurement will not be used lo compute Ihe description of the I wo 
blobs because their sharp boundaries cause the associated intensity change Eo be described 
from peaks in Ihe small masks (by the Selection criterion}. The selection criterion {figure lef) 
Will CPUs* the description to bfr computed from thr smaller masks unless the blobs are 
severely deFocussetf- 4 

Another interesting point is that we fail 10 "ses" Abraham Lincoln in L D. 
Harmon's coarsely campled and quantized image of bint {reproduced by JulesZ 1971 p.3 I I). If 
measurements from linear $irnple tells were freely available lp later processes, and if we 
were able to select them by receptive-field size, we would presumably be able tQ interpret 
that imago without physically dctocUssing It. According to the present theory, the mask size 
used 1 to compute the description Is chosen by the selection criterion. This is consistent with 
Harmon ■$ JulestVs f1973) finding that noise bands spectrally adjacent to a picture's spectrum 


are mist effective at suppressing recognition, sines th&SO have mos} affect On mask re-sponse 
amplitudes near the important mask sires. Furthermore, because two peaks rh Hu; firaph Id 
won! d cause the algorithm to Croat# two local edge assertions (with different degrees of 
Fuizincss), it also explains why removal Of only lha middle spatial Frequencies from such an 
ImaKO loaves a recognizable imago of Lincoln behind a visible graticule (figure Id Of Harmon A 
Jules* 1973). 

The structure of the raw primal sketch at it is first delivered from the image 
may be summarized as followsr 

PEL The primary visual processor delivers a symbolic description ol the identity changes 
present in an image, This description use* the following primitives to describe intensify 
chengesr 

' ) Various types ol EDGE 
(ii) UNEs, or thin- BARs. 

GH) BLOBs 

The items ■(1)- and (i|) have been assembled into straight segments, ind short noise etiminatipn 
has occurred. PS2, The following items are computed ahd bound !□ eech alament OF Ihs 
description. 

{I) OfllENTATIQM - of an edge,, line Or ban of Iho major axis. of 
" a blob or a group. 

{iij S12E - longlh and width If both are de Fined, diameter if 
major and minor axes are efrual or undefined, 
till) Local CQNTKA5T, 
fiv} POSITION, 

(v) TEfikiJNATSCIN TOIOTS* 

Whzit rfrirwiirfr |p!f m 

The second step of tt# arngumant depends On Our ability to interpret simple 
pencil drawings that lack semantic content. By examining suiLable examples, wa can infer with 
some confidence that certain symbolic grouping operations must e*lit In our visual systems, 
■In order (0 establish the principle That grouping processes, sometimes exist, let us first take 
an extreme case. When one looks at figure 9a, there can be tittle doubl that sGm# process is 
creating a circular contour joining the inner ends of the radial lines. The path of this contour 
Is marked by an apparent change in brightness, l*ii then but comparable lo that observed in 
the Kanizsa triangle illusion. 

[n deciding how this tomes shout, we nay distinguish throe rival theories, fl) 
A local process Operates to join neighbouring ends of Mnos, {SI The Inner contour is 
constructed by some mechanism lhat relies upon |ho placing of an edge-shaped mask In the 
position shown in figure lb. (3) The radial lines taUse a "Gestalt" oF a ^sun" to bo instantiated 
For describing the situation. This very high-level concept then imposes the contour on the 
Figure, 

|l (2) were correct, it would disprove the primal sketch theory, since it requires 
that a mask output value be identified in a simplistic way with an assertion about a contour. 


lllu&Fraf'(tn 9c disproves (3) howfe-veri because the contour remains visible despite tti* 
presence cF an intensity distribution that would remove Or negate the mask values On which 
£5) depends. If (3) wore correct, it would imply that cfOwnwarcf-f lowing information has a 
JCcet Influence on early processing - a view which runs counter (0 The second main thrust of 
I he present theory. Theory (3) assumes a sensitivity to radral lihes. The lines in figure 9d 
art however also radial, and thi? 1$ not immediately obvious. 

The possibility remains that some combination of £l> and C3) is what re^tly 
governs our perception of the rigufO. The Important point is that the initial acquisition oF the 
"sun h concopt probably relies on the mechanisms In (1). Once accessed, this Gestalt may 
influence the computations to the extent of deciding thai the sun part is the foreground 1 and Is 
therefore slightly brighter, but such an influence determines only one hiF pi thp final 
description, Ffgur# 9e makes it unlikeFy that the particular "sun" 1 gestell has even this effect, 
since it provides a simitar example in which *eods —of—IKings* form a perceptually "brighter* 
obscuring region. |t is more likely that the rel alive brightness re Fleets a {context-sensitive) 
assumption about the sign of foreground-background contrast, 

Those tramples establish that abitracEEy dtFined 1 places Fn an image can be 
assambted inFo contours that have a deFinite perceptual existence, and that Shis Operation 
probably precodftt I he access and applied ion of higher-lavd concept to I he image. From a 
computation*] point of view, it is nature to [hint of the phenomena as occurring in two steps. 
Firstly, certain things in drawings can cause 'place-Tokorw" to be defined in some abstract 
sense. Secondly, place-tokens so defined can be grouped In various Ways, 

In how many ways may place-tokens be defined, *nd In what ways may they be 
grouped? We see from figure 10a that a short line may define a place-token, and from figure 
10b that a small blob may also do so. The endoF a line that is nol loo short, or of a blob with 
Feng maiOr axis and ?hprt minor may also define a pi ace-token. (The imprecision- of the 
boundary between “too long," and "too short” is ircCmsequentialj because near it, both 
definitions usually load to the sere groups. The boundary need's Fo be in the region of 0,5 t-0 
t degreoj of arc at human loveal resofuEFoii.) 5mat! lolisetionc oF bfobs {figure ICc) or of 
lines {figure JOd) may also be I reeled as a unit. Because bF the variety of ways in which this, 
may be done, {figure I Oat it k probably implemented by the rule that a group OF place-tokens 
may also define a place-Foken, rather than by different rules for groups or blobs, groups ol 
lines, groups half oF blobs and hatf oF fines and so forth. Hence although place-tokens can be 
described and to some extant selected by properties of items at lhal place in the image, the 
grouping processes themselves read place-tokens and are insensitive 10 tF*e particular way a 
place-token was obtained. The nation of * pbce-loken 1$ a good example of thp principle of 
explicit naming, and the separalian of the way in which a place-token k defined from the Way 
lb which it is grouped illuslrates (he principle ol modular design. 

The recursive character ol Ihe definition ol a pi ace-token leads one to evpett 
that the grouping processes responsible for them read and write ini a the same storage. 
Otherwise, one would have to maintain many copies dT the storage and grouping processes. 
Instead of just One. if only one copy is kept, two organisational ru!es must be observed. 
Firstly, whenever 9 set of place-tokens is grouped ID form a new One, On!y the new token Is 


FIGURE 9. Tihe illusory contour in fa) is somewhat similar to the Kjinlzsai triangle II 1 tannol b& 
duo to a jimpltr cal I irt cOnfigura'inn (bk because the contour is still visible in (ti It cannul be 
due to a gestalt of (he sun induced by radial Imes, because the lines in (e) are radiaPj yet this 
is not readily apparent. A »imiP*f Illusion is present in fd! suggesting that the apparent 
brightness of the inner disc rafleelt t default assiimplian about foreground-background 1 
contrast, ralher than any high level influence. The IhBOry attributes the contour to local 
processes that join nearby ends oP tine*, iuch processes are mechanisms of ounslruction 
rather lhan mechanisms ol deteclinn, 























FIGURE 10, Plate-tokens may be defined Ip an image in several ways, and may Iben be 
aggregated by certain standard techniques. Small lines [a] or blobs fb) may define a plate- 
lokeri. So may small collections of place*. Ce and d). Ttie definition and the grouping of plaee- 
fcokens may be regarded as ’ndependent processes, because grouping does not depend on the 
way (he place-tokens were dotined. This is shown by Ce), irt which every subgroup is defined 
ditforently^ yet the col!irijari(y of *|J of (.hem is immodialely apparent (nlormatiort S-Uch as 
driontalion may be bound to a plate-token because it was Intrinsic to Ibe element I hat gave 
rise to it, Such in Forma'ion may he «sed to help grouping. 
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subsequent!y visible to lha grouping processes »nd secondly, there is 3 priority system that 
cperats= Among competing protases such thal (for example) very local groups usually take 
procedan<*, In ie racoon between r.ivar ratal grouping* is often races*ary to arrive at ■ 
frouping Military to them all. In figure SOe, the local group, are formed before their 
organisation into a Nne. Grouping precedes are sensitive to orientation, Intensity flighlness), 
fyjziress, and various reasurss of the size of an Item in the base, as well as to spatial 
proscfinrly and COlIFnearRy. For example, Orientation information may or may not be present, 
flO«, LOb, I Is, ilk), and H present, it may <1 Ba, 13b> or may not flQa) b* used Indeed 
these two siluetior* can Occur In the sar** figure Ulc), Combinations of Bpatiai proximity and 
Of Similar Ori ent ebon are Often important. 

We sea from those examples that placfr-tohens can be grouped into regions 
directly, Or into curvilinear a-ssemblies that define regiona by acting as Iheir boundaries. The 
Genia l psychologists were aware of those grouping phenomena (Wertheimer 1923), In 
BdditFOn to the region-defining facilities just mentioned, if the number of places involved li 
very small (Fess than 5 say), (he place* may form a standard, named configuration (see figure 
lle-h) which is evidently described relative to an axis which is imposed on Ihe figure, and 
whose cte Fault value is fhe vSrticai. 


Sf!jHirflFfit ff figure end ground 

Be rare the digression of ihe Fast section, we' had reached lha point of defining 
the caw primal -Sketch, and of showing how to compute most Of the quanlilies in it, Wo also 
examined |he primal sketch of a very straight fa rwaref image, of a rod The primal sketch is 
rarely as simple as that, however. Figures 1 % If), I @ and 21 contain examples of the primaF 
skekhes of more cOmplen images, and as One might expect, they am Fn general l ?r ge end 
unwieldy collections of data. Fgrlhermo-e, It is difFkull to see haw the complexity oi |F> 0 
prFmal sketch cauld be an artifact of Our particular choice of prirpitivear [mages realty are 
complex In Ihrs way. 

The unwieldy nature of the primal Sketch ere ales what appears to be |he main 
task of the next sta a o of visual information processing: how do wo selecl regions that shnuFd 
be treated as unit farms by subsequent descriplive prOcesses 3 and can this be done without 
complex Fntoradions between the primal sketch and hypotheses about the nature of the forms 
that ere being extracted? In perceptual terms, the computational problem Ibel we must now 
address corresponds to distinguishing between Figure and ground, and it Is strongly reeled to 
Ihe problem of texture vision (Jutesz 1&?1 e.g. pp 105 fF). In neurophysiological terms, if' 
area 17 roughly speaking computes the primaF sketch w C come new jo the problem that the 
next stage must sOfve. 

Wo have now reached Ihe core of I ho first pact or Ihe theory. We saw [n Ihe 
lasl section that cerlain computational Fatililit* exisl and are deployed during our reading of 
(■ertein kinds pi drawings. It hs of course possible lha! tISe-lr existence Is no more than a 
happy accident, which fortuitously allows us to interpret the idle sorikblmES of th.0 artisticafly 
gifted. The present theory was however founded on the Observation that drawings and 
images appear surprisingly similar. It tikes the view that ihe processes exhibited by (Nj 


drawings of figures 9, 10 and 11 are not entity sxamplBs: the ability to perceive the 
envelope of 3 Iree, 3 fow of bushes, Or even the border oi * grass Pawn can depend on such 
processes, and they are pari of the riasOo why cornp- i.j 1 nr vision- has Sad such problems 
finding object boundaries in the pash A central aurtritn.b 0 / itii theory it that tlntm grouping 
pta CtffJvt are avaitahta pr^Uvly fcccatwe ttay are nerind in Mp intarjtrel ffi* ptltnni iMcA; 
OJtef /EirlAfirnore that than Ijnn^lw p^MJtr.! b I4ftlkr uilh /irj t-order dt^eririJnalcrinjt, 
ftcffhjf 1 rqeu.rjit'efy an PAe rfc.icrfpiiiapj In fta primal tktith. are niffir.irnt to flceeriPiE for- meif 
e/ ifte ronjriT 0/ "non-eM entire" ufjfow */ which me er* cepeM*, wicfrEri th* clan of imngot In 
which thi> crlicErt ii r<r*trittfd. in other words, the extraction of forms and associated 
"texture" discrinnfnalions are actually implemented by firs! order discrimina lions, together with 
a small number of grouping operations, acting on the primal sketch of th® image, We now- 
study in more detail (ha grouping operatici*? On which the second p^rt oF Ihe theory depends^ 

CnJnpfltf Enrh-ni^uA-j 

The purpose of the grouping techniques Outlined 1 here is therefore tp partition 
Iha prFm^l sketch- into unit forms, |n a way that is useful for subsequent recognUioa The' 
important que-stion concerns the extent fo which hypotheses shout the nature dF a form need 
lo interact with the processes that extract It. The issue is One ol degree, not principle, since 
WB shall shuw that some downward-flowing information *nay bo necessary to cOmplele 
segmentation. The demands of speed and fluency make it desirable to minimize these 
downward influences, and our main conclusion is that For most images, such Influences effect 
OnFy a small number of the decisions taken during grouping. 

The most important guideline for the design of grouping techniques is the 
principle of least commitment. According to iNf prlncip f s, oach step is irreversible. Hersca 
Cirfy groupings that are reasonably certain mny be made. This forces One to decompose the 
overall process into several steps, and to fate iduantags of aS many cues as possible la help 
in [he decisions that are made al each step. 

GurvifErtear aggregation 

We deNne curvilinear aggregation to mean the assembly of pFata “tokens [hat 
contain an Oriantplion Into a group that preserves it. This type of aggregation i$ one- 
dimension al rather thnh two, and the discovery and use of the appropriate lbf.nl orientation Is 
central to it. Wo shall see that pne-dimensional grouping pf-otessos are by far the most 
Important kind. TwO-dimensiunai grouping seems to bfc necessary only localfyi Farger regions 
that aro characterised by 3 texture predicate are best found by computing thrlr boundaries. 

Information that determines whether two items should he grouped'COm-eS 
Initially from their primal sketch parameters and spatial dispositions. The primal sketch 
parameters arc Of l&htation, contrast, type (EDGE, LINE etc.}, and fuzziness, Spatial information 
includes the -distance between the nearest parts of the two items, and (he relationship 
between the nr ieritafions assoclaled with the i'ems and the orientation OF the line joining their 
nearest parts. 

Because Of the principle of least commitment, fho flrsl stage of grouping 


FIGURE It. ( 3 ) and (b) give examples of groupings in which Orientation Is important. In (c), 
orientation is ihlpOi'tsnl for constructing I ha square, but nol (Or perceiving the collineari-ly of 
fho rotated "Vs" etrOsi the tnicj-HIc Information about similarity of orlentallofi is used il H can 
be, but H li not disastrous. If il canool be. (d) shows hew the *r Isolation of a small aggregate 
can he used to form a larger jggreg^te. Evidence like Ibis suggests the! the results of these 
primary aggregation processes are written Into lha sana stOrsge as the primal tfcaSch. <e}" lo 
(h) give some ostamples of "standard coniiflur&tions* that we have lound It useful to recognize. 
The reader will probably perceive them relative to a verlicel rais. The VEE shown in (h> Is 
used in figure 2Id. 
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FIGURE IS, Examples in which semr-IOcaF and global constraints tar influence local measures 
Of preference during aggregation. (a} shews * 5 ct of place-tokens, and fb) illustrates I ha 
possible pairwise gfnupmgs that focal neighborhood analysis permits. The situation after She 
firs! pass is shown In (cJ. Informal ion froM this pass m^kps. (d) (he preferred link on (he 
Second pass, In (e)> I he linfcs between 1 fi 2, and between 1 ft 3 am pypluatod as equally 
desirable on purely local ground's, Th* Overall closure property creates a preference far (he 
link that uses 2, In (he primal sketch, (he alfirtlly between two elements is evaluated 
simultaneously along severaF dirfr?nsiortj r Consideralions such as these can often cause a 
parlicular grouping to emerge as clearly preferable to any others. 
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FIGURE 13. The image PLAINT, whose hail-[one representation. appears in figure do, ha* been 
printed in (a). The actual intensity vrfues lhat □ccw r within the superimposed rectangle ha™ 
feerrn but in table 1. The spatial Information fron the primal sketch of Ihrc imsifie Is given 
in lb), Typical- segments that arise Tram I he first two sieges m ttir'YilsrWrar' aggrea a *i° n appear 
in (t) and (d>, Thr* primal sketch does ™| contain quite enough mlorntatron to separate the 
Ivm-d leaveSj anj the aggregation techniques deliver the form (d). They have however- almost 
succeeded in the separation, IE one piece ot information is added tthal segment 1 does net 
match segment 2^ the aggregation routines can separate (•) into (f} and (g}. 
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TA&LE 1. : he top table Shoi-is the i n ten e- 1 ty voices fo' a Ena I 1 s&C t ion 
of the PLANT tsee t^Mre 12) r The i eu?r t S-D l e gives the 

values of edge-aask convoSut 1 nos over the eeme region. 0n | y residual 
decay fropi the edge above this region is sieaeorab I e, Nq general-purpose edgO"#" nder* 
Cbuld discern the edge of the nearer leaf In this part of the Image, 
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combines two Elements only It (hey match in almost alt raspe-c's, a^o very close to- one 
another, and if there era no other candidates. This typically reduce* th* number of grotipftbl* 
elements to about a third pf the numher presenl Ln the rsw primal sketch, The second stage 
can then mjlie use ol eatr? infCrmalaOn given by the first Sometimes, the only extra clues 
are- that some segments are now quits long {more (hao 20 in'i^gu elements^ Such Segments 
ptm-Pst certainly have some physical importance, and hence in the srtphd siege it t* safe to 
tombme two such Elements even if (hey fail to malch on some parameters, prqvided that 
[here ere no other reasonable candidates In the vicinity,, In some situations, the first stage 
will actually have introduced new information which tan then be Used by thn second stags.. 
For example, figure 12a shows a set of places that are to hi agg^&atecf, Ohd It'd possible 
links between nearby places ere shown dotted in figure 3 2b, TN? first siege of aggregalio-n 
Inserts tha unambigwus segments (12c}. 0y the second si age, an orientation parameter i( 
present, Bnd lhis f together with the equal spacing, or the colfinear tokens, makes the grouping 
.shown In figure 12d the preterred one- 

So ms results oT these two grouping processes are illustrated by the analysis ol 
the Irnoge PLANT,, wbich is exhibited because It raises wversl points of intarest^ Figure 13a 
gives the printed image whose halMcne representation appear* In figure At, end 13b show? 
its primal sketch, Figures 13c amd 13d show typical segments obtained by the above 
processes, Malice the ragged nature of 13d; this is a cOmmtn feature Of (ho high resolution 
analysis of indistinct object boundstes„. The local O-rienlaliOn of the raw primal sketch 
elements is preserved only roughly here. 

Having exhausted all those situations in which aggregation takes piece more or 
less by default, we turn now to Ihe other technique that characterizes an application ot (he 
principle Of least commitment, namely the rejection of relatively unlikely possibilities. The 
•Tib! hod is to sat up a node for each ol the end* of the segment* thfrt Were delivered by tho 
proceeding processes,, and to associate with each nude a list of the nodes that could possibly 
match this one. Mol ice how (hi* presuppose i that each s*gm#nt-end can be assigned an- 
infernal name (principle pf explicit naming}. Each OF th* possible matches Is then ovalupte-rf 
independently along several dimensions, and possibilities that are graded relatively poorly by 
several methods of evaluation, and well by nore, are struck Oul. 

Our present implementation assesses the possible choices using measures of 
relative -contrast, orientation, alignment or misalignment, distance, edge type, fuzziness, 
whelher an item acls as a goad intermediary between two segments that match very Well, and 
whether a closed Form would be created by choosing a particular segment. The Idea behind 
this is straightforward. It has Song been known lo the Ceslsll psychologists that in a line 
drawing, each of these criteria can cause elemenls to be grouped together in a "preferred" 
Way (Wertheimer L$23). In the much richer environment oF Ihe primal sketch., there is 
frequently enough inlormation available to apply all pF these criteria simplteneou$ly r If most 
Or all of them agree in selecting a particular grouping, one can be certain enough ol its 
correctness to select thal grouping irrevocably, There is nothing special about the way in 
wtiFch (he preferences of the difterenl methods ore combined: If *n obvious choke exists, it 
is taken, and any theory would select if. Ef flu choice is not obvious one needs additional 


FIGURE id r About two people in Ihree fail to perceive the original of this imago correctly (be 
first time. The failyre 1$ c^sed by the accidental alignment oF |hp subject's forefinger and 
nose. This laiFure shows that simple Ideal processes are important during I her analysis pF an 
image., and that deliveiy by them ol an incorrect grouping is not a normal event. TMs good 
evidence ogalnsi Che hypothesis that Earry visual processing is designed around a foiture- 
driven control structure. The fact that one does rot make the same mfclakii s second lime 
shows I hat soma downward-! lowing iniormalion can afFecI early processing. Only a small 
amount is required to prevent recurrence oF the error. 


f. fc&y*oC * ; 






FIGURE’ 15. The neasu^e oi 1 he overlap Of two adjecant^ parallel lines depends on an external 
an* e, Jheta. In (at, theta is 90 degrees^ which s the value al wh ch iteration bedims in the 
routines th.st decode 'his type of grouping, (b) a^d (cj show two Other values cf theta. 
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information, *od a theory that happened to make the correct choice an marginal grounds in 
one image would f&il irt many Others. The Interesting point is an empirical one. - that these 
crude selection criteria arn very effective. They enable on* |p solve simple images 
completely, and almost la solve even quile difficult ones. Applying the criteria is relatively 
inexpensive, because I ha number of segments that exist a! Ibis point I* much lets than the 
number pf itetnt ih the raw primal *kekh, ThFs type of filler analysis has I he added attraction 
of being readily extendable, because lha addition ol extra filteri-ng criteria simply leads to the 
rejection Ol rrOre of the candidate* at a given node. 

All Of Ihe tillering criteria described above are local in the computational sense 
that they dp not depend On the result* ot subsequent higher-level processes. Put this doe* 
not mean that the criteria are spatially local. Far example, which- of the two segments 2 or 3 
should he joined wilh segment L in figure 12e? r*J preference exists On purely local grounds, 
bul b decided pref^r^nce arises from the closure propsrty of the whole figure. Only a limited 
degree of sensitivity to COnnectedress appears to he present in human visual systems fMinsky 
Si Papert 1969 p.73>, but it is not hard to devise a detection scheme that would Operale 
tufficrerrlly well to help in decoding many image*, while filling to provide a complete 
sensitivity to connectedness. 

A del ailed account of the selection Criteria that appear lc he ucelul will be 
given in a separate article, bul il these methods arc taken as a theory of part Of our own 
visual system?, there is One consequence that would Follow I ram even the sHutchy account 
given here, ll it were truo that most OF the tin's, decisions about local groupings ar& taken 
Using criteria computed at roughly Ihe same slagE cl the analysis, rather Iban by extensive 
use of downward-fFowin-g inlormalion, it should be posiible to Find images In which a 
particular grouping is greatly to be preferred Oh most of the criteria described here, but 
which is nevertheless incorrect. Furthermore. If FOw-Igvb! decisions are indeed irrevocable fas 
the principle oF least commitment asserts}, their failure should cause severe damage to Ihe 
perceptual analysis ol aft image, Occasionally, ana FFnda a photograph in which Sh-c accidental 
alignment of conlours causes this to happen, and figure 14 shows an image whose original is 
misinterpreted the lir*l time by about twe people in three. The accidental alignment gf the 
forefinger with the nose appears to he responsible for the failure. El is interesting that one 
dots not make Ihe same nn slake ’he second time one view* the picture: and lhal in the roal 
wgrld where stereo disparity and rn&tion information are also available, one almost never fails 
fl-t tho same low level. 

TYoniJIltufail of tifljTJiifvrd Finrtrj 

The next important consequence of the principle of least Lornmitm^nt Is that If 
no clear leader emergrs Prom the group of contending possibilities, ell possibilities lhal were 
not rejected are accepted. IVo arbitrary .choices are mado- this parly in-Ihe analysis. Nodes at 
Which an ambiguity exists ate marked, and themselves Form part ol the informal ion the! is 
sent to the next stage In the processing, The reason for doing this is that subsequent 
processes then have access to whatever trouble-spots exist low-cr dowri In tho imago PLANT, 
pert of She nearer leal happens So have the sa-ete intensity as its background Table la show* 


the actual intensity values in (he raclangto f34, 37) to (49; 58), and tabfe lb shows the 
approximate edge-mash convolution value* (here. Although SO^o Intensity changes do exist 
above this area {near >(44, ET1)), they are insulFiejentliy distinguished to allow the grouping 
method* described above to separate.the two leaves. Accordingly, ell of Ihe segments are 
included in on* Form, shown together with the segrnents it contains In figure I3e. (II has 
been separated manually tr&m the stem, for clarity! 

If the nodes that support this figure are maintained and can be inlluencod. by 
subsequent processes, the amount uf information needed to separate the two leaves Fs very 
Smell. For example, One decision can suffice;' if it is asserted that segment (I) (floes not match 
segment this information is sufFic : ent lo allow the aggregation litter network to decompose 
the imaga into Ihe two parts shown in |3F end 13g.. 50 although sOrre downward-flowing 
information is needed here, the amount required is small provided that ■! is applied SO *s to 
Use tbs partial results obtained at I ha lower level 

Thrln-nggrr.gcititin. 

Tho techniques described above group items that possess an.intrinsic 
Or irritation (or acquire one early In the processing),, in a direction that approximates Iheir 
fptai orient at ion, TFieta-afigregation is the name we have given to Ihe process o! grouping, a 
set qF similarly oriented items in a direction that dilFeis from their intrinsic oripotatlan, but in 
a manner which uses it (e.g. figure tla). The technique 1$ to use very focal grouping 

■ measures to form place-tokens lhat have an orientation associated with the group rather then 
with the local elements, and then to apply curvilinear aggregation lo these tokens. The 
difficult part about It i% (hit measures of the ^overlap" of two neighboring oriented items 
depend upon the angle, Iheta, that 3ho aggregate neke-s with each local unit (see figure 15). 

■ So theta determines Ihe aggregation process, but also depends upon it. For good date, it may 
be quite unnecessary Is know theta; aggregation of the places lhal each individual element 
defines will suffice to compute Ihe aggregate. ]h general however, one will need to lake into 
account ihe relation between the OveraL direction of Ihe aggregate and tha or tent a! ice of (he 
local elements. Viewed from a very abstract kvef, this compulation may be regarded as a 
process of solving a targe number or rather simple equations. 

Grouping into jieifffiffl-rbonds nrnf nt^Jcnt* 

The second category of grouping operations concerns Ihe setaclion of a region 
by Ihe presence there of some distinguishing total properly, We First examine Ihe nature of 
the local properties On which such grouping operations are based, and secondly we make 
some brief comments about the grouping technique! that operate Cun them. 

Semi-feea! meojurrj. Front an abstract point d view, the primal sketch is simply a large body 
of date. There is therefore np difficulty in extracting From it certain treasures and statistics, 
computed trOm the parameters that are hound to Ihe elements ot the sketch. Such measures 
provide * useful coarse description of a neighborhood 'in the ImaEe. They can be used to 
control the type end depth of the analysis that Is applied to a region, £w to select 
neighbourhoods tor subsequent grouping into regions, (n particular, we shall ossump lhal 


over moderately sized region* (0.5 to 1.0 degree* at foveel resolution) of the primal sketch, 
the following distribute ns are available 1e processes that ire capable of ashing certain 
straightforward statistical qvest'on* of them: 

■DO. The total amount or contour, and number of blobs, at different contrasts and intensities, 

Dj. ORIENTATION: the total number Of element? at oath, orientation, and the total contour 
length at each Orientation, 

D2, SIZE: distribution Ot the tl» parameter! defined in the primal sketch, 

D3. CONTRAST; distribution of the contrast of Mens in the primal sketch, 

DA. SPATIAL DENSITY; spatlai density of place-tokens defined rn the different possible ways, 
measured 1 using a small selection &F neighborhood size*. 

The straightforward statistical questions referred to ahove include such matters ss whether 
the distribution Is unltorm., Of has ore, two, three or more peaks; IF peaks On 1st, where they 
are and their relative sizes, H the distributions are very sc Pile red (like Orientation 
distributions), Eha corresponding questions are whether 1 he orientations are grouped in * 
significant way, of ere roughly unilormly spread out. St has teen our Experience that 
straightforward histograrmbased selection tcchnlquM suFfiee to drive Ihe initial exarni nation of 
an image. For example, 1o evsminfr the characteristic* of the Orientation distribution in an 
Image, One forms an orientation histogram based on ten degree wide or I total ion buckets. The 
figure o( ten degrees was obtained empirically, end appear* to be suitable for all images. For 
spatial grouping On the other ha*td, the scale at which one applies histogram-based techniques 
depends upon the place-token density ol the particular image being analysed. Once again, we 
have not found it desirable lo use elaborate statistical tests. If a property is significant, any 
reasonable test would detect it If a property fs marginal, no statistical model can alter the 
foot. 

The finaf facility that wo require is tfve abitily to select from the image Ihute 
areas or items that give rise to obvious features ol these distributtons. For example, in figure 
IB items at an orientation of Ed degree* are ilrOhgly predominant. We assume that item* at 
about this oriental ion can be selectad IrOni the primal ikelch For examination by processes 
fbal specialize in grouping such collections together, in anOlhBr image, one might wish lo 
examine first all those item* whose contrast was greater than a certain Value, These Facilities 
are used only when test* indicate (hat Ihey should be, and they tan help the analysis of en 
image by grealiy reslricting the nuflfew nf elements in Ihe primal sketch that need to be 
considered by a particular process. 

fteundaty of a frt}up of plaee-loJt'PW. The distributions DO - DA, and the density of place- 
tokens obtained from itens irt the primal sketch, can lead lo Shp ‘Splitting of an image into 
rollons. The cenlres of figures Ida A I Ob provide simple example* ol this (see -ajso Jtrlesz 
1971 ppldSfll CTCaNaghan (1974a A b) surveyed the literalure On dot-£touplng studies, and 
defined a tocel operator tor obtaining boundary Tines ol dutterS oF dots. The idea is that the 
shape and extent of the cluster* arc subsequently computed From The Focal.boundary 
Oh;; merits, 

Our experience has been that purely local method* can usually be improved by 


adding- to then a sensitivity to the ”over.aEP direcllan of s boundary. The Interaction between 
local and gfobal Information resembles that shown- in tigura 12, The overall direction of a 
■group Of piece-tokens can be- obtained cheaply by finding peaks in their epatial density Or its 
gradient- $uch a mechanism a r low-s One Id Obtain an overall description &l the shape or 
orientation or a group at plates fra/ere a precise assFgnment o! local boundary points has 
been wade. This fs relatively easy to implement, and it has the advantage* pf speed and 
econpmy that lead one Eo expect il In our Own visual systems. 

Itetnrioit (a imturn—i.i)d<?n diirrimfftoiirm 

There are several current ideas on texture processing. Some authors have 
used Fpyrier techniques, and in certain circumstances the Spelial power spectrum can 
successfully separate different regions (Bacjsy IS??!, Others have constructed specialized 
operators which sometimes discriminate between region* with differenl texture, Probably the 
eartiejt example dl this was Ihe Roberts gradient (Roberts 1963}- The most interesting and 
comprehensive propdsar is due (□ Jtulesz, (Jutes* (1 9‘6-2‘> i Jules z, Frisch t Gilbert £ Shepp 
tl973|j iule^z f 19751), who showed lha’ visual textures that difler only in their third Or 
higher Order statistical structure are rarely perceptually discriminable: whereas visual 1 
textures that dilfer in thpir first pr second-Order statistics can usually be distinguished. The 
important point about this rinding lies in i.ls demonstration of the essential simplicity of 
texture discriminations. Although it gives til tie insight inlo how lhp processing r? implemented, 
il does imply that wilh Vo terra series expansion, a , l coefficients of terms whose p-ortfer 
higher than 2 are zero, 

The present theory indutJ&s texture discrimination with the other techniques, 
for extracting forms From the primal skekh, ind asserts, that texture discriminations are 
actually implemented by the family of lirst-order discriminatjorts and gsfqyping processes Ihel 
act upon the primal sketch. The class Of computations 1 hai These processes define differs 
from but overlaps considerably with (he class of all secOnd-Order operations o-pereling on the 
Original Intensity array. JuieiJ ((975 mentioned In an aside I ho possibility tTiat 1 exlure 
vision may rest PH first-order stylistics of various simple feature exlraclors", but (his idea 
requires the concepts of the primal sketch and of recursively applied grouping before it can 
be brought to fruition* The principle dilfe'ente belween the two approaches is that the 
present theory is process-oriented, since It tests on (hfe belief that early processing of visual 
Information is in fa-c-l implemented in this way. The second-order discrimi nation theory 
provides a phenomenological description. As with many other problems of biological 
Information processing, it will be interesting 1o see whelhe-r the phenomenology can be 
described accurately without explicitly defining the underlying compulational processes, 

$o that the reader may Form an intuitive grasp oF (he way In which I In- present 
Ihaory accounts for texture vision discriminations, let US re-ex amine some of the textures 
devFsed by Julesz, and follow this with some examples ol the texture analysis run on some 
natural Fmages. Firstly, consider Figure IB. Ailesz notes that in 16 a, the two regions have 
distinct secCnd-or.der statislics, but not in figure 16b. Hence, according to his rule, the two 
regions, are distinguishable >n 16a, but not In 16b, The present theory explains (hi* as 
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F2 GU3E 16. Examples tif I ex I ure e devised by Julssj. Ail four cont,Y.r. $ square- region that 
differs from lha background- (a} ard {W obey Adetz’s conjecture; in (c), the secOnd-Order 
statistical structure *F toe square differs from, that of the background, yet we CAnnot 
distinguish the two, In (d), Iho sec-Q^d-Order structure is Uniform,, yet we can faintiy 
dis-lingu sb Ihe square region. The present tfsCOry accflunts for these, examples, and defines a 
sot Of discriminations that r-eilher contains oqr is contained by the set d all seeond-ordor 
discriminaiionB. 



FIGURE J7. The spatial inFp#m^ion OF the primal ske(tb of the image CHAtE (figure fla) is 
shown in (a), tb} and {p] show two Urals that emerge alter aggregation, and fdJ gi^s the 
skeleton of the chair In which this aggregation leads, (Th5j skeleton was obtained by 
selecting I he longest edge From each aggregate, and adding the edgs whase center Ires at (30, 
67JJ- 0y using the texture that is present in the image, the problem qF diy1n|n$ th* three- 
dimensional shape of the object has bean separated from the problem of rotOgrrZFhg Its 
so r Face &l rue lure (one takes (d)- as its data, I he other lake* units Hko (b) or (tf). No 
downward-flowing informal ion wai necessary to HccOnptish this. 
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follows: orientation measure* are the only distinguishing feature of Ihe primal sketch 
representation, because everything else has carefully been held constant, In l&b,. (he two 
bask elements are related by a ISO degree rotation,, end so Ihe Qmenlatten stelistics I* which 
I hey give rka are identic at. Fence the two region are indislimgukhiblo, In 16a however, 
(her® is more contour at 0 degrees than at 9(J degrees m the Central patchy hul the Opposite 
I* true hr Ihe surround Hence (ho two region* nt> Irnirtedtetety distinguished. 

The second example appears as figure 16c. 'SO''fte oF the modules In I he pattern 
have been reflected about » vertical tin* through teelr centers. Their secernd-order statistics 
are therefore different. This is an rvamplri Err which Ju'esz's generalization fails. The 
orientation statistics or the contours., «Ad Ol the local groupa they form, are however 
unchanged because only vertical and horizontal oriental ions are involved. Hence the present 
theory predicts that Ihe two regions should be indistinguishable without scrutiny, as indeed 
they are. This establishes fhal Ihe class ol second Order discriminations includes s-Gmo 
0 coral ions that are not included in the class defined here. 

The aggregation technique thal was illustrated En FrgLtffl provides an example 
Of a technique whose complexity is higher than second-order. Pi criminal ion of the 
■distinguished region CHn Just be made in figure l&d, and the reason seems to be lhal the dots 
"string together" belter there than in Ih.e background This would be an unusual use ol the 
aggregation techniques, but it does allow* us to distinguish the region front its surround tvein 
though the second-order statistic*! structures of lha two are identical, II does not however 
allow us to bp confident of Ihe exact boundaries. 

fesontpte* of ita irnalyjij n/ ifl-mn rr.nl in in jns 

(n order to illustrate (he Usefulness' of the theory, we shall now examine the 
results of applying it'to some images. Figure 17a shows Ihe primal sketch at; Ihe chatr whose 
Image appeared as figure fl,a. The firsl thing (0 realize about this Image is the! il Is textured 
at elf, Tho texture Is SO simple thal one easily overlooks it, yet it exists in exactly the sense 
of this article. The pretence oF Ihe texture is suggested by the existence -of three clear 
peaks in- Ihe orientation histogram, and Ihe texture Usetf it decoded by grouping nearby items 
with similar orientations.. Figures 17b ® C show lypiesl results ot running this procedure on 
this image. 

Each Of these aggregates can now be described simply by position, orientation 
*nd extent, and this produces a skeleton of the. outline of the chair (figure Hy 

considering separately (he structure of Just one aggregate, one could go on to compute a 
description si tho surtecS *truth.ire oF tho material pul oF which the chair S*. made. Using One 
autonomous technique, we have separated [but not of course solved) Ihe problem of divining 
the overall three -dimension aF shape of (he chair IrOm the analysis of Its surface properties. 
This ability is vital if the organization or subsequent analysis I? to be modular. 

The next example shows a dillituU case af thetf-aggrogalicn, The image 
(figure ^d) is not very cOnfra^ly because II war, taken (rom a photograph (Brodalz 1EFG& pi ale 
t>J I). The intensity vtlyes have been printed in ligurc LSa, snd Figure LSb shows tho spaliol 
component of the primal sketch. Contours OF all intensities, Fenglhs and orientations 0*0 


shown, and as One would expect from an imago of this complexity, IBb has a somewhat wescy 
appearance. Part of (hp mest can bo removed by excising eFememts responsible Tor Tho 
Eowest-ctmlrasT peak in the contrast-distribution histogram, but the crucial clues com* from 
the Orientation distribution. Table 2 provides rough Informal ion about The amount of contour 
that is present at each orientalign, from which it is evident that items at an orientation of 
around 60 degrees predominate. The average length of items at this orientalion is 13„ These 
coarse measures causa the ley lure analyzer 10 attempt to group lha edges at I his orientation. 
Initially., the direction along which grouping shputd take place Is unknown, so stringent local 
grouping parameters are used. This leads lo the primary ctuster shown In figure 18c. From 
this, an overall direction is obtained (-S3 dags}, and curvilinear aggregation then groups the 
items Into the stripes shown In JSd, e, f, g E b* This completes primary texture processing. 
Once Che primary stripes hj-vt been obtained, the seme analysis operaling recursively on 
tokens tor these stripes servos ((? relate Ihem lo Gne-anPther, Notice that in Ibis particular 
image, some of the slripe information has bfren picked up directly from the intensity values 
(see figure 1 SbT. This would no! be .true of a more herring-bmi (Oxlurfe, and tho analysis 
does not depend upon it, Our present system is successful at proceeding herring-bone 
textures of similar complexity in which the (wo types of stripe have the same average 
re fleet ante. Figure L9 demonstrates this, ft shows the analysis pi figure n i.: , which is a 
fragment Of Dr, Eric $andewaifs waistcoat. 

Finally,, I give two examples of Images that are simple enough for the 
aggregation techniques to extract the important Forms unaided. The local elements of the 
primal sketch of the rod p! figure 6 are grouped by tho first two stages oF curvilinear 
aggregation into the unit? shown Fn figures 20a,. b & C„ The third stage assembles them into 
the form shown in 2Qd, The reason why the first two stages cannot complete the job is 
because Of the aUtrnjtiyfrS ne^r (33, 6tt}, and because the contrast across the top-left portion 
pf {he form has (he opposite sign from the contrast elsewhere. 

Several types OF analysis have been applied to the image Of a toy bear (figure 
2IJ. The half-tcir'e image (figure 4f} has bsen printed in 21s, and (ho intensity map is given In 
£lb. The primal sketch of Ihis image is represented by 211 The blobs eylractfrd from (Ns 
image appear in figure 2Id, and the routines for describing the spatial disposition of a small 
number of plates recognize that these form a (VEE FLATJ configuration (cl. figure lib), 
described relative to th*: default vertical Mis, The contours that form the bear's face appear 
in 2Le, and 21f shows his muzzle. The extraction of [ho muzzle marie uso of I ho ctostrd form 
properly, as welt as discrepancies in contrast and fvizines$ 3 white choosing between rival 
segments near coordinate 6?! 1 - 

IKiMpbk 

Perhaps the mosf novel aspect of these ideas Is the notion tha| The primal 
sketch exists as a distinct and circumscribed symbolic enlily, computed autonomously from I he 
image, and Operated On repeatedly by e number oF local geometrical processes, semi-local 
measures., and first-Order d scriminat'Dns. The underlying reason why One needs to compute 
such a lhirg is that in same sense a description Tike the primal sketch is much closer to what 


TABLE 2. First-order measures taken Over ths primal sketch can can trot 

the ex e Cut toil- of grouping techn i C;ue4. This table sbous the 

orientation statistics of the pel ha l sketch sheun In figurs 18. For tfto 

purpose of illuStratson 6 the orientations have been 

divided into disjoint buckets 15 degr^c-S wide, and ths total amount 

of contour ar.d number of prime! sketch elements are shoun for 

each of these buckets. Ang; criterion uould judge 00 degrees to be an important 

on i entat i Oh. The processor therefore tries to group contour* having this 

orieniatieiv 


ORIENTATION 8 IE 23 4£ 60 75 30 155 125 135 150 165 

IdegroBEl 


NUnBER OF E4 7 14 IE 1G1 27 42 IE 25 28 24 1G 

1 tens 


TOTAL CONTOUR E32 S4 132 11G 2213 IfiG £03 118 138 304 331 138 

LENGTH 


is F-sa>|ly thore (i.e, chang&i in refl^cT^nc*]- lhan lha values 9? odgs-shflpsd or bar-shaps'd 
COnVOl u I ions„ which form a r@f go- and confusing sol ol primary meesur ecr+nts. IE would 
be almost imposs.lhCsr to cJaj-1 wi1h so huge a mass oF (fat? unfess it were (irsT organized into a' 
readable format. 

The sldrage into which lha primal sketch is written 13 the direct analog For (.he 
(1^55 of images sludied her* gf the Cyclopean leEkr-a that MrsZ (5971) wrote of for binocular' 
vifOm tAsre subjectively, what il holds corresponds vary closely'to (he "Image* tbal one ia 
conscious oT r This reFFeets lha cp-mpulatrons’ hypothesis that all subsequent analysis reads 
this primal sketch, no! the data From which it was computed. The prims* sketch (h-e rc-fore jicl; 
in 3 genuine serfs# as the interface at wMeh vFsUhI analysis beGOmos a purely symbolic affair. 

/jnpiipnriPFH for neajuphyilotofy 

Tha Images studied hero ere impoverished by their inherent lack of movement 
or binocular disparity, Extreme caul ion is needed when al temp ling Id make predictions From 
such a- theory, because of the power of these (we type-s of inform plibn, For example, a linear 
cell with a conlcr^surroynd receptive field is a hopeless blob-sSSSrtrjj- cm its Own. The frog's 
fly-ca!chinj system Only works because the erJditiorjjl constrain! of rfcletlvfr taoliQn 1* o-clclcecl 
(EJarlqw J95S, Lfltlvln tft ci. 1 So 3}. Movement inFcrmation together with some extra circuilry 
wight oven (urn a linear simple cell with a b>r-jhnpt*d roceplive Held tnlo a passable detector 
Of bars In an image. EM a simplistic cchisrre nF this sort, though possibly acceptable to a cal* 
would be of little use Tor deciphering » motion's ss scone, tt is therefore reasonabr& to expect 
tbel something tike a prims! sketch i? computed, at (oast by lb? h%bvr primates, 3F it is, the 
celts lha! represent the prim at sketch shcuW exhibit the cons-eEjUsuees of etg-or'-Ims tike 
peak-rural thing, Ihe selection criterion, and Ihfr (otherwise surprising) inter-Orient at ion 
interactions that are central to its construct ion. Ckfe would jlso enpiact grouping prOCSeses 
that use disparity or motion information to teke as. thtdr Input (lie primal ikfrtch and at toast 
spin# qF (he ctas&ei of tokens phlalr-sd From it (Mp-t 157fl). 

At a higher level,. One would b upset lo find .exp 9 reman I al evidence of th^ 
aggregation processes Chat Ihe theory predicts should ec" upon the primal sketch to 
decompose it into* unit forms. Some of Ihese processes have natural neural representations, 
and some do not. For curvilinear and Ihela-aggregation, one '*011117 expect lo find a cell lhat 
marks I ho Over oil direction el agg regal ion independently of the orientation of the local 
Elements, One would also expect to find cells that represent piece-tokens (recognizable .by 
their insensilivity \o whet is at the prated end cells For carrying pieces of Ihe Focal first- 
order end 5 paliel-density measures that aro important for texture-based definition of regions. 
The design of the mosl likely neural representation OF these processes is not straightforward. 

77rrf (jl/luiFTiiS 0 / fiigA-er-dctfJ' A If rtbrl?AH-d i>J pafpttt r)Jt 
('t.iuftJ fjrt/orlfirtdu'h prrtctrjjiJff 

There are I wo broader impticalions oF the theory that are worth menlioning. 
Firstly, the four princip'es staled at the beginning of the arFpcIs have survived intact, and 
(heir guidance has been valuable. The principle Of feast commitment has played kit espetudly 
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FIGURE 19, The analysis. Of Shis herringbone patter (figure Ac) damerv&trfoSes that the 
methods for diatjflguishjrifr two texture regions do not depend on their having different 
average reflectances.. (a) aherwi the printed innage f and fo} the spatial component of the 
primal iketek Typical extracted gripes ere shewn in {c> and (d). 












FIGURE 20. The firs! two sSe^es of curvilinear assres&SiQh have been run on She primal sketch 
of She rid show- in figure 7, and they produced She elements {a), (h) and (cfc. Onto larger' 
units have been Qbta ned* the covernine parameters can be relaxed t and She elliptical form (d) 
it obtained be the third step. Up Id Lh:s pOinl, the system has neither computed nor used any 
dcscripICr it the firm’s dvera'.l shape. 















FIGURE 2l„ TFd image of a toy bear (figure flf) has been printed in (a}, and its Intensity map 
appear? in fb). The spatial comptinpnl o-F Eha primal sKetch is illustrated in (c). The three 
principal Forms extracted frqni fcl appear in (d), {ef and ffK The items in (d> are classed as 
BLQ& 5 t and the tonfi^uralidn 1 hat they lorm k recognized 31 a VEE (figure 11b) with modifier 
FLAT, The axis leFalive to which this configuration was computed is the vertical (del suit 
value), The Outline of the bear (e), and of his muz'zle (0 are simple enough to have been 
extracted using only the techniques described in this arlicle, The closed form property was 
used to help decide between competing segments at coordinate (SO, E*i}, (The vertical 
appears as the negative *■ axis, because this irnag<? was taken wilh the camera on its sid*l. 































important role, by it^ pressure On us to design a system 1 hfll does not usually db anything 
wrong. IE caused us Is abandon idsaj about h t:i£g*r Festurirs" in favour of lbs compulation ol 
a irue dtscHpt iem f which Eed In turn to tfw? gradual elucidation cil the priinesses I bat era 
tip cess ary to rend it, The result is bulky raShsr than tompEsv, and requires prodigious 
computing power but Nttte computing sophisticalion {it could be i triple mtuled without dtlfkulTy 
ir a slackness mactl>nttX Tbare ten how ewer be no doubt [hat in terms of sheer processing 
power, (he human visual system must he i pectttuEarfy well-endowed, 

The second implication p! |n|eres( concern.* the structure at subsequent 
1*0 cQk bitiOn processes. If noti-»ticnlivO vision may fee implemented successfully by 
appro* imalely the set of me! hods defined in Jhi* article, It means that visual "forms" can 
usually be extracted from the intake by using hnowfeejee-free techniques. In other words, the 
extraction of a visual form can usually prererf? its description, From this if lolFews that if Is 
usually easy Co compute a nrum drarnpiiAn oF a farm before having any idea about what I ha 
far m Es, 

[f this Is truo, it greatly simplifies the design of subsequent roc-Pgultfoo 
processes, be^usy it moons that they too tae bo nede modular, For bimmpf*, the ability Is 
compute ai coarse description of a form allows One to describe the shape of a forest without 
first computing detailed descriptions of oil the trees; or to compule the stupe of the cluster 
of blobs-that forms a distant vitlaj* independently of deciding; that tome oF those blobs sm 
actually buildings and .th.it the cluster Is therefore a village, in the more mundane exampfs of 
figure Zip One can compule that the Overelf shape of the lop Form is roughly OvOfdat without 
first having fo segmenl out and describe separately Inn binrps that are the boor's cars, I he 
autonomy of early visual pro ceasing permits the role of higher level Knowledge! to bo Very 
restricted., and different in Kind from its intervention in programs liJco Shirans {19V3}. 
Dowrtward'flowing information will not affect the line finding stage {the computation of the 
prfmat sketch) at slf. Its most usual mmfus hpcrcJiii is in choosing which processes are to be 
used to reed the primal sketch - lor example by specifying which Icratwre prochc-nt^ should bo 
used prs the to selert Ih* parts dF current inlsrest. 11 can also apply terlmh limited 

Kinds of ffags to critical segmenfs during their aggregation into Itunis {as in the iPLAtfr), 
Tho coupling between higher-lever knowledge end Ihe form-extraction processes [& however 
much weaker than the coupling between the different form-aatroclion processes. 

It fa clearly desirable to have sOne control Over which ol the possible farms Fn 
s figure should ba tklv«r«(f ol a given moment from the primal sketch. For example,, in the 
Firisgs BFAFf (her? are three possible major forms; thb outline of |he head, lbs tnuzEte* and 
the throe blobs that represent h'9 eyes and nbse. It seems probable that only one nF these 
should be made available a! a lime, and this in lurn raises interesting qocsUPnS About th? 
Order in which it is done, Ihe way in which Ihe three forms and their relative positions are 
described, and the Way In which Ehp&e descriptions trigger a larger datastrutture and are 
absorbed by it. In living systems., which ore powerful enough lo Operalo lb real lime, the 
control of the direction OF gaze may bo richer closely rotated lo the Order in which theso 
events take place. 


/Jcfc n#kjJWu-fmejti*: This study would not have been possible without Ihe advanced and 
flOtflbl* to-mputing, fac'ld es that are available-at the Artificial Intelligence Laboratory, I Iharh 
Prole. M. Minsky *rid S, Paper! for mviting me Id Ihe Feboretary: Hassan Alarn, Gary Dudley* 
end Especially Ken Forhus lor programming assistance; Ur* S. Ailesz end Pion Publications Icr 
permission to reproduce figure ]5; end Karen Pre^dergasl ior preparing, the drawings. The 
material described in Ibis paper covers UIX A,| r Lab, Memos 3£4 fi 334. This work was 
conducted at the Artificial tnl el licence Laboratory, * Jvlassachuselts Institute of Technology 
rprpaTch prpgrarp supported in part by the Advanced Research Projects Agency of the 
Department of Defense, end monitored by the Office of Naval Research under Contract number 
N00014-75-G-064& 
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